Revisiting gap locations in amino acid sequence alignments and a proposal for a method to improve them by introducing solvent accessibility
نویسندگان
چکیده
In comparative modeling, the quality of amino acid sequence alignment still constitutes a major bottleneck in the generation of high quality models of protein three-dimensional (3D) structures. Substantial efforts have been made to improve alignment quality by revising the substitution matrix, introducing multiple sequences, replacing dynamic programming with hidden Markov models, and incorporating 3D structure information. Improvements in the gap penalty have not been a major focus, however, following the development of the affine gap penalty and of the secondary structure dependent gap penalty. We revisited the correlation between protein 3D structure and gap location in a large protein 3D structure data set, and found that the frequency of gap locations approximated to an exponential function of the solvent accessibility of the inserted residues. The nonlinearity of the gap frequency as a function of accessibility corresponded well to the relationship between residue mutation pattern and residue accessibility. By introducing this relationship into the gap penalty calculation for pairwise alignment between template and target amino acid sequences, we were able to obtain a sequence alignment much closer to the structural alignment. The quality of the alignments was substantially improved on a pair of sequences with identity in the "twilight zone" between 20 and 40%. The relocation of gaps by our new method made a significant improvement in comparative modeling, exemplified here by the Bacillus subtilis yitF protein. The method was implemented in a computer program, ALAdeGAP (ALignment with Accessibility dependent GAp Penalty), which is available at http://cib.cf.ocha.ac.jp/target_protein/.
منابع مشابه
SSALN: an alignment algorithm using structure-dependent substitution matrices and gap penalties learned from structurally aligned protein pairs.
In template-based modeling of protein structures, the generation of the alignment between the target and the template is a critical step that significantly affects the accuracy of the final model. This paper proposes an alignment algorithm SSALN that learns substitution matrices and position-specific gap penalties from a database of structurally aligned protein pairs. In addition to the amino a...
متن کاملContext similarity scoring improves protein sequence alignments in the midnight zone
MOTIVATION High-quality protein sequence alignments are essential for a number of downstream applications such as template-based protein structure prediction. In addition to the similarity score between sequence profile columns, many current profile-profile alignment tools use extra terms that compare 1D-structural properties such as secondary structure and solvent accessibility, which are pred...
متن کاملDevelopment and Sequence Analysis of a Cold-Adapted Strain of Influenza A/New Caledonia/20/1999(H1N1) Virus
Background and Aims: Vaccination is the most effective method to prevent influenza infection. Among the available vaccines, cold-adapted live-virus vaccines are suitable approach that have been produced and evaluated for recent years in few countries. The goal of this project was to derivate a cold adapted variant of the influenza A/New Caledonia/20/1999(H1N1). Materials and Methods: Influenza ...
متن کاملAccounting for solvent accessibility and secondary structure in protein phylogenetics is clearly beneficial.
Amino acid substitution models are essential to most methods to infer phylogenies from protein data. These models represent the ways in which proteins evolve and substitutions accumulate along the course of time. It is widely accepted that the substitution processes vary depending on the structural configuration of the protein residues. However, this information is very rarely used in phylogene...
متن کاملPredicting solvent accessibility: higher accuracy using Bayesian statistics and optimized residue substitution classes.
We introduce a novel Bayesian probabilistic method for predicting the solvent accessibilities of amino acid residues in globular proteins. Using single sequence data, this method achieves prediction accuracies higher than previously published methods. Substantially improved predictions-comparable to the highest accuracies reported in the literature to date-are obtained by representing alignment...
متن کامل